Abstract

Three Amino-acid Loop Extension (TALE) family homeoproteins are involved in plant growth and development can have potential to promote resistance in response to various abiotic or biotic stresses. This class of proteins has been widely studied in many plant species. In current study, in silico analyses were performed to reveal the putative role of TALE transcription factors in stress tolerance and organ development. A total of 89 proteins were identified in G. hirsutum, which were further categorized into 17 subfamilies such as BLH1, BLH2, BLH6, BLH7, BLH8, BLH9, BLH11, BEL1, KNAT2, KNAT3, KNAT4, KNAT6, KNAT7, KN1, OSH6, H1 and ATH1. Biological information analysis showed that amino acid length is 250–850 aa among TALE proteins and molecular weight ranges between 30–90 kDa, persistent with characterization of TALE protein itself. Chromosomal mapping showed similar pattern of distribution of genes on A and D sub-genome of tetraploid cotton. Gene structure and motif distribution analysis revealed that each cluster of TALE proteins have identical intron-exon number and motif pattern. Among 89 TALE proteins, 34 gene pairs in G. hirsutum showed the segmental duplication with Ka/Ks value less than 1. Expression analysis confirmed up-regulated expression of TALE members belonging to subfamilies KNAT3, KNAT4, BEL1 and BLH1 under various abiotic stresses, suggested their putative role in response to stresses. Furthermore, significant expression level of these genes in various plant tissues also predicted their role in plant development. Promoter analysis illuminated the presence of several tissue specific or developmental plant hormone and stress responsive cis- elements in the promoter region of TALE members in G. hirsutum. In conclusion, comprehensive analysis of TALE transcription factor family revealed its potential candidature for breeding stress tolerant genotypes and exploring its role in organ development of cotton. © 2020 Friends Science Publishers

Keywords: Cotton; Phylogenetic tree; Gene expression; Gene duplication; cis-element analysis


Introduction

Transcription factors (TFs) are proteins involved in regulation of gene functions, either by enhancing the transcription of target genes or by repressing the transcriptional activities of genes in plants (Gao et al. 2014; Yuan et al. 2018). In plants, transcription factors are mainly involved in various biological processes like growth and organ development or regulatory response to biotic and abiotic stresses (Osorio et al. 2012). More than 60 transcription factor families have been reported that are involved in functional activities in plants, as their transcriptional activity is achieved through the binding of DNA binding domain to the characteristic sequences i.e., cis- regulatory elements, found in promoter regions of the genes, which are being regulated by these TFs (Nuruzzaman et al. 2010; Yao et al. 2014).

In higher organisms, homeobox genes are reported to be involved in differentiation of body parts during early phases of embryogenesis. These genes contain a conserved 60 amino acid (aa) long DNA- binding domain called as homeodomain (HD) (Nam and Nei 2005). The three dimensional structure of HD composed of three alpha- helices, in which the second and third helices form a helix- turn-helix DNA-binding motif (Desplan et al. 2008). Homeobox genes are categorized into two distinct classes based on conserved amino-acid sequence and characteristic HD motifs. One of the two is “typical” HD, specified by a length of 60 amino-acid while the other one is “atypical” HD, identified by the variation in the length of conserved amino-acid sequence (Bruglin 1994). TALE (Three Amino-acid Loop Extension) is an “atypical” HD containing 63 aa, with three extra residue extension between first and second helix (Chen et al. 2003).

In plants, TALE superclass proteins are divided into two different families i.e., KNOTTED1-like homeobox (KNOX) and BELL1-like homeobox (BELL or BLH) genes. KNOX proteins consist of KNOX domain at N- terminus with subdomains KNOX1 and KNOX2, and also ELK domain followed by the HD at the C-terminus. KNOX protein has significant sequence similarity with the MEINOX subclass of TALE superfamily in animals, suggested their presence before the divergence of animal and plant lineages about 1.6 billion years ago (Bürglin 1997). BELL proteins have SKY and BELL domains at the N- terminus and HD at the C-terminus. These domains are responsible for heterodimerization with the KNOX proteins. The formation of this heterodimer suggests that BELL and KNOX function in the same manner (Bellaoui et al. 2007).

On the basis of gene expression and sequence similarity, KNOX family is classified into three classes (Magnani and Hake 2008). First class include STM, BP/KNAT1, KNAT2 and KNAT6, second class has KNAT3, 4, 5 and 7 and third class includes KNATM, can interact with other TALE members to regulate their activity (Kimura et al. 2008; Magnani and Hake 2008). The BELL family consists of RPL/BLH9, PNF/BLH8, ATH1, SAW1/BLH2, SAW2/BLH8 and BEL1- with well- defined function. However, function of BLH1, BLH3, BLH5, BLH6, BLH7, BLH10 and BLH11 is still unknown (Hamant and Pautot 2010). The interaction of KNOX and BELL proteins is important for their nuclear localization and their binding affinity to DNA (Kim et al. 2013).

The TALE homeoproteins are reported to be involved in meristem formation and maintenance, organ morphogenesis, positioning of organs and several characteristics of the reproductive phase. Loss of function due to mutation in class1 KNOX genes helps in determining the cell fate and patterning of meristem like abnormal leave and flowers development in maize (Kerstetter et al. 1997). Gain of function in mutant will explained their capability to change the plant morphology as in maize, tomato, tobacco and Arabidopsis, it resulted into knotted or lobbed leaves (Reiser et al. 2000). Ectopic KNOX gene expression leads to the compound leaves formation in tomato (Hareven et al. 1996). In potato and tobacco, negative regulation of Gibberellin (GA) biosynthesis in meristem is controlled by KNOX proteins. It is also reported that KNOX proteins interact with BELL protein in order to regulate hormone homeostasis (Hake et al. 2004). Another study in soybean revealed that KNOX protein was up-regulated in root nodules and actively take part in its development related biological processes. BEL1 type gene SH5 is responsible for seed shattering in soybean by developing abscission zone and suppression of lignin biosynthesis (Yoon et al. 2014). A gene GhKNL1 is notified in cotton belonging to class2 KNOX (KNOTTED-LIKE) proteins, and is specifically expressed during secondary cell wall (SCW) biosynthesis. Moreover, dominant repression and overexpression of GhKNL1 gene in Arabidopsis produced a reduction in interfascicular fibre cell-wall thickening of basal stem of transgenic pants. All of these studies concluded that as a TF, GhKNL1 is involved in regulating the fibre development in cotton (Gong et al. 2014). Similarly, a study revealed that TALE genes are involved in regulating the secondary cell wall biosynthesis of cotton fibre (Ma et al. 2019).

A significant amount of research has been conducted to increase the productivity of cotton crop but biotic and abiotic stresses always remained potential threats in hampering the yield especially in the prevailing scenario of climate change. In order to compete with these yield-declining factors, many efforts have been done at molecular level to induce resistance or tolerance against such stresses. TALE family proteins exhibit significant role in regulating developmental processes and hormone homeostasis in various stress environment. Heterodimerization of KNAT3 and BLH1 genes was investigated to be involved in up-regulation of ABA in response to salinity in Arabidopsis (Kim et al. 2013). ABA responsive elements in Arabidopsis, were revealed to be involved in osmotic stress tolerance (Kim et al. 2011). Recent study in poplar implied the significant role of TALE family in salt stress through tissue-differential expression (Zhao et al. 2019). Recently published cotton genome of tetraploid (G. hirsutum) and diploid (G. arboreum and G. raimondii) cotton species allowed us to comprehensively identify and characterize TALE family proteins in cotton. In this study, we discerned 89, 45 and 45 candidate TALE proteins in G. hirsutum, G. arboreum and G. raimondii respectively. We performed their gene structure, conserved domains, protein motifs and phylogenetic analysis, chromosomal mapping, gene duplication events, expression profiling of candidate genes under various environmental stresses and their cis-element analysis. Our results highlighted the role and evolutionary history of TALE proteins and serve as a platform for understanding molecular and biological importance of these proteins in developing the stress tolerant cotton genotypes.

Materials and Methods

Identification of TALE family in upland cotton

We downloaded TALE family amino acid sequences of A. thaliana, T. cacao, G. max, V. venifera, C. papaya, G. arboreum and G. raimondii from Plant Transcription Factor Database (PlantTFD) (http://planttfdb.cbi.pku.edu.cn/) (Jin et al. 2017). A local BLASTP search with e-values of 1e-10 was run to identify candidate TALE genes in G. hirsutum, using protein sequences of Arabidopsis, cacao, soybean, papaya, V. venifera, G. arboreum and G. raimondii as query. Protein sequences of final TALE genes resulting from BLAST search, were retrieved from Cotton Functional Genomics Database (CottonFGD) (https://cottonfgd.org/analyze/). Moreover, the protein sequence of selected cotton TALE genes were investigated by domain analysis with Pfam database (Protein family: https://pfam.xfam.org/) (Finn et al. 2014), and SMART (Simple Modular Architecture Research Tool: http://smart.embl-heidelberg.de/) (Letunic et al. 2015). Finally, redundant sequences were removed manually, resulting 89 TALE amino acid sequences. Physiochemical properties were examined using ExPASy server tool (https://web.expasy.org/protparam/) (Artimo et al. 2012). Cellular and subcellular localization of TALE proteins was predicted by an online software PorComp9.0 (http:// www.softberry.com/berry.phtml?group=programs&subgroup=proloc&topic=protcomppl) (Briesemeister et al. 2010).

Phylogenetic analysis

The phylogenetic analysis of TALE genes was performed using full length TALE amino acid sequences of G. hirsutum, Arabidopsis, G. arboreum, G. raimondii and cacao, which were first aligned using ClustalW in MEGA 6.0 software (Li 2003). Then, the multiple sequence alignment of protein sequences was used to conduct phylogenetic analysis, and an unrooted tree was created using the Neighbour-Joining (NJ) method with parameters as follows: complete deletion; bootstrap value with 1000 replicates; Poisson Model; complete deletion, the p- distance, data subset to use; Substitution. In addition, a separate phylogenetic tree was constructed using TALE protein sequence in G. hirsutum for understanding of further analysis.

Chromosomal mapping and duplication analysis

The positions of TALE genes on chromosomes of tetraploid cotton were determined through genes features data available in CottonFGD (https://cottonfgd.org/). Map Chart software was used to map GohirTALE genes on the chromosomes. For duplication analysis, BLASTP was performed for G. hirsutum using protein sequence of G. hirsutum itself as query. Gene duplications were distinguished based on the similarity > 80% and at least alignment percent > 80% compared to the total lengths of proteins (Yang et al. 2008). For further estimation of duplication events, synonymous (Ks) and non-synonymous (Ka) substitution rates were computed by Clustal Omega (https://www.ebi.ac.uk/Tools/ msa/clustalo/) (Sievers and Higgins 2018) and alignment of the paired genes analyzed by PAL2NAL (http://www. bork.embl.de/pal2nal/) (Suyama et al. 2006). Ks values were translated into duplication time in millions of years based on rate of 1 substitution per synonymous site per year, to estimate the evolutionary time of duplicated GohirTALE gene pairs. The duplication time (T) was computed as T = Ks/2λ × 10-6 Mya (approximate rate for clock-like rate λ = 1.5 × 10-8 y for cotton) (Wang et al. 2012).

Synteny analysis

For synteny analysis Circoletto (http://tools.bat.infspire.org/circoletto/) was used to generate a circos plot visualizing the similarity pattern of GohirTALE protein sequences (Darzentas 2010).

Gene structure and conserved motifs analysis

TALE cDNA and genomic sequences were downloaded from CottonFGD (https://cottonfgd.org/). For finding genes structure, the online Gene Structure Display Server (GSDS 2.0) (https://gsds.cbi.pku.edu.cn/) was used which investigate the gene structure by comparing each genomic sequence with its corresponding cDNA sequence. Conserved protein motifs in GohirTALE genes were recognized using MEME suite (version 4.8.2) (http://www.nbcr.net/). The parameters used are as follows: max. no. of motifs-25, optimum width from 6–450 and any no. of repetitions (Bailey et al. 2009).

Gene ontology (GO) annotation

The functional classification of GohirTALE genes and annotation analysis was performed using CottonFGD (https://cottonfgd.org/). GO annotations of TALE genes were associated with GO terms using Molecular Function (MF), Biological Function (BF) and Cellular Component (CC).

Promotor cis-element analysis

For promotor analysis, 2 Kb upstream nucleotide sequence from the start codon of 89 GohirTALE genes were retrieved from CottonFGD (https://cottonfgd.org/). Then, these sequences were subjected to PlantCARE program (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/)to find the potential cis-regulatory elements in our identified TALE genes.

Expression profiles of GhTALE genes

The expression patterns of GohirTALE members under abiotic stresses (salt, heat, cold and drought) and in various tissues were executed by using the RNA-seq data of G. hirsutum available in CottonFGD (https://cottonfgd.org/). The FPKM values were used to generate heatmap of TALE genes expression. Venn diagram was created using the Gene IDs prominently expressed in abiotic stresses and in various tissues from online software, Genvenn (http://habtom.github.io/biojs-vis-genvenn/examples/).

Results

Genome-wide characterization of cotton TALE transcription factors (TFs)

TALE TF encoding genes of G. hirsutum and homologus genes from A. thaliana, T. cacao, C. papaya, G. max, V. venifera, G. arboreum and G. raimondii were analyzed. A total of 432 TALE genes (DNA binding domain, having three extra amino acid residues resulting three-amino-acid-loop-extension of 63 amino acids) were retrieved from Plant Transcription Factor Database. Out of 432 genes, 97 non-redundant protein sequences were obtained, meeting the crucial e-value of 1-e10 by BLASTP. The HMM profile of the Pfam TALE domains (PF05920, PF03789, PF03790, PF03791 and PF07526) were used as reference to assure the presence of TALE domain in cotton genomes. SMART domain analysis was done manually to analyze TALE genes in cotton and it showed the presence of POX or homeodomain, KNOX1, KNOX2 and ELK domains in all genes (Fig. S1). Conclusively, about 89, 45 and 45 candidate TALE genes were identified in G. hirsutum, G. arboreum and G. raimondii respectively. The number of TALE genes in G. hirsutum (AD genome) is twice as compared to G. arboreum (A genome) and G. raimondii (D genome), which was in agreement with their tetraploid and diploid genomes. All identified TALE genes were categorized into three distinct classes i.e., BELL, KNOX1 and KNOX2. Subcellular localization of TALE proteins by Softberry bioinformatics tool revealed that all 89 identified TALE proteins were found to be localized in nucleus except two proteins, which was predicted to be present in plasma membrane.

Analysis of different physiochemical parameters by ExPASy showed the significant variation among cotton TALE proteins. The mean length of 89 TALE proteins is 489.81 amino acids ranging from 298–803 amino acids, and the average molecular weight is 54.443 kDa (32.837 to 88.584 kDa). BELL class proteins have higher average length (622.87aa) and average molecular weight (68.73 kDa) than KNOX1 (355.83aa and 39.74 kDa) and KNOX2 (329.71aa and 37.48 kDa). The hypothetical isoelectric points among these proteins vary greatly with maximum and minimum value of 8.936 and 4.427, having significant variation in KNOX1 members (4.471–8.416). Other classes have theoretical isoelectric point between 5.642–8.936 (BELL class) and 5.257–7.041 (KNOX2 class). The grand average of hydropathicity (GRAVY) varied from -0.998 to -0.382, this negative value suggested that these proteins possessed hydrophilic nature.

Phylogenetic analysis of TALE genes in cotton

On the basis of protein sequence alignment and phylogenetic tree analysis (neighbor-joining) of G. hirsutum, A. thaliana, T. cacao, G. arboreum and G. raimondii, 218 TALE genes were divided into 17 distinct subgroups such as BLH1, BLH2, BLH6, BLH7, BLH8, BLH9, BLH11, BEL1, KNAT2, KNAT3, KNAT4, KNAT6, KNAT7, KN1, OSH6, H1 and ATH1. The size of subgroups ranged from 6 to 28 genes in ATH1 and BLH1 subgroup respectively (Fig. 1). The TALE genes of G. hirsutum were distributed among various subgroups as: BLH1(12), BLH2(5), BLH6(6), BLH7(2), BLH8(4), BLH9(6), BLH11(4), BEL1(6), KNAT2(4), KNAT3(8), KNAT4(4), KNAT6(6), KNAT7(6), KN1(2), OSH6(6), H1(6) and ATH1(2). Phylogenetic analysis of subgroups was supported by the similarity in exhibition of GohirTALE protein domain repeats within the subgroups. BLH9 subgroup was clustered at two different positions in the phylogenetic tree, BLH9(A) clade have genes of G. hirsutum, arboreum and raimondii while the BLH9(B) cluster exhibited the phylogenetic relationship of Gossypium, Arabidopsis and cacao genes. Another NJ phylogenetic tree was built using the protein sequences of G. hirsutum (Fig. S2). Cotton TALE genes were also clustered in 17 subgroups as in phylogenetic tree of five species. In this tree, genes in subgroup BLH9 were clustered at one position.

 

Chromosomal mapping and duplication of TALE genes

 

Chromosomal position of all 89 TALE members was obtained from CottonFGD (https://cottonfgd.org/). Thus, 88 TALE genes were mapped onto all 26 chromosomes (A01–A13 and D01–D13) of upland cotton. However, one TALE member was not apparently mapped onto any chromosome and was located on a scaffold. The genes were unevenly distributed on tetraploid cotton chromosomes with varying density on different chromosomes and chromosomal regions (Fig. 2). Although, the distribution of genes on A chromosomes were approximately same as that of D chromosomes. The remarkable density of genes was notified on A05 chromosome and its homolog D05 chromosomes with eight genes on each chromosome, and the minimum density of TALE genes was found on A02 chromosome with only one gene. Collectively, a higher number of genes were mapped on D genome in comparison with A genome having 45 and 43 genes, respectively.

Duplication analysis revealed that segmental duplication is the leading cause of expansion of TALE gene family in upland cotton. Segmental duplication occurs when duplicated genes are located on different chromosomes. In this analysis, 34 duplicated GohirTALE gene pairs were experienced segmental duplication, including 15 duplication events within A and D chromosomes whereas 19 duplication events between A, D chromosomes and scaffold. Moreover, evolution time of identified cotton TALE genes was calculated by computed their synonymous (Ks) and non-synonymous (Ka) substitution rates using PAL2NAL. The Ka/Ks ratio is a parameter used to predict the gene duplication mechanism after being diverged from their ancestors. The Ka/Ks ratio of 1 indicates neutral selection, value less than 1 suggests purifying selection while value greater than 1 predicts positive selection (Hurst et al. 2002). In this study, we computed Ka/Ks ratios for 34 pair of segmentally duplicated genes. Ka/Ks ratios for all gene pairs were found to be less than 1, indicated that cotton TALE genes have undergone purifying selection. Furthermore, Ks value was used to determine the gene duplication time of GohirTALE genes during the evolution of cotton genome. The segmental duplication event in Upland cotton occurred between 1.12 (Ks=0.0338) and 25.88 mya (Ks=0.7765) with an average of 17.923 mya (million years ago) (Table S1). It is concluded that TALE genes expansion in Upland cotton is mostly occurred due to whole genome duplication events during the course of evolution.

Synteny analysis

Circos plot was developed for synteny analysis using online software Circoletto in which arcs of different colours show varying percentage of similarity between the sequences (Fig. 3). Such as blue arcs showed least similarity (upto 25%) followed by green shows similarity upto 50% and orange for 75% similarity while red is on the top representing similarity above 75% to 100%. The extent of similarity increases with the increasing bit score in resulted local alignment. Most of the genes are orthologs of each other such as Gohir. 1Z073200 is an ortholog of Gohir. D07G022500, resulted due to gene duplication. The green ends represent the N-terminal whereas red end shows the C-terminal. The small blue ribbons show the domains and also represent their presence on N-terminal or C-terminal.

Structural features and motif distribution in TALE genes

Evolution of multigene families is possible under the mechanism of structural diversity of gene and divergence of conserved motif (Wang et al. 2013). Therefore, evolutionary relationships of TALE protein families were examined with the combination of intron/exon structure and motif distribution (Fig. 4). This analysis revealed that intron/exon structure within the TALE domain is highly conserved. Approximately, 93.25% TALE genes possessed 2 exons with variation in exon lengths between and within different subfamilies, while remaining 6.75% genes contained one or no exon. Contrary to this, intron number was characterized according to KNOX and BELL-like domain, as majority of the genes in KNOX domain have 4 introns except subgroup KN1 and H1 (with 3 introns). However, BELL-like domain is characterized by 3 introns with little variation such as BLH1 subgroup having three genes with two introns. It is also observed that variation in intron length was highly specific in close members within subgroups.

Conserved protein motifs were explored by MEME for 89 identified cotton TALE protein sequences. Twenty-five distinguished motifs were identified. Motif 1 is present in all TALE genes which is annotated homeodomain, while motif 3 and 4 are characteristic motif of KNOX domain. Similarly, BELL-like domain has motif 2, 5 and 6, which are conserved in this domain. In KNOX domain, motif 20 and 21 is highly conserved in KNOX1 except subgroup OSH6, while KNOX2 have motif 7, 8 and 13 as conserved motifs. Motif 5, 9, 10, 15 and 18 are frequently found in BELL- like proteins while motif 14 is specifically present in subgroup BLH6. Smallest subgroup ATH1 is distinguished by the presence of motif 19. Motif 16 is conserved in BLH2, BLH8 and BLH9. In phylogenetic tree, BLH9 (A) is diverged separately from BLH9 (B) due to the absence of motif 15. Motif 20 of KNOX1 is also found in BELL-like subgroup BLH11, which verifies the interaction of both the domains. It is more evident that proteins belonging to same subgroup, ported similar conserved motif, proposing that those proteins have functional similarity.

Gene ontology (GO) annotation

Gene Ontology (GO) database was used to examine the biological function, molecular function and cellular components of cotton TALE genes. The obtained GO terms for TALE genes from GO database are GO:0003677, GO:0043565, GO:0006355 and GO:000634. The results concluded that 89 TALE genes were prominently involved in molecular function such as DNA binding and sequence-specific DNA binding. All TALE genes are involved in a single biological function, mentioned as regulation of transcription. However, only 41 TALE genes categorized as cellular component, which are predicted to be allocated in nucleus (Fig. 5). G. hirsutum TALE genes were greatly involved in molecular and biological function but few genes were involved in cellular component. It was observed that TALE genes with KNOX domain were involved only in cellular component.

Expression analysis GohirTALE genes 

The expression patterns of G. hirsutum TALE members were examined against abiotic stresses such as heat, cold, salt and drought using publicly available transcriptomic data (Zhang et al. 2015). Genes belong to distinct subgroups (KNAT3, KNAT4, BLH1 and BEL1) had considerable expression levels at different time intervals of stress treatments (Fig. 6a). In order to illustrate the expression trends of TALE genes under each abiotic stress, we created a venn diagram. The diagram showed the number of genes expressed in each abiotic stress as 54 out of 89 genes exhibited significant expression under salt stress, 52 genes sowed upregulation in response to heat stress, 51 genes were highly expressed in drought and 50 genes were expressed in response to cold (Fig. 7a). Besides this, expression pattern of TALE genes in various tissues was also analyzed to comprehensively understand the possible cause of expression against abiotic stresses (Fig. 6b). Majority of the cotton TALE genes were prominently expressed in leaf, root and flower parts such as stamen and pistil. Out of 89 TALE genes, 73 genes were highly expressed in leaf, 72 in roots and 79 genes collectively expressed in pistil and stamen. (Fig. 7b) On the basis of higher expression level against abiotic stress, 18 genes (about 20% of the total genes) were screened, which could be further analyzed.

 

Promoter analysis of GohirTALE genes

 

cis- regulatory elements in promotor region of genes plays a crucial function in regulation of gene’s expression. They are found in upstream regions of transcription start codon. To investigate the cis-elements associated with the regulation of cotton TALE genes, we selected 2 kb upstream sequence from 5’ flanking end of the transcriptional start codon. The PLANTCARE database analyzed various phytohormone, stress, light responsive and developmental cis-regulatory elements in the promotor regions of the query genes (4. Fig. S3). Out of total 89 GohirTALE genes, 58 contained developmental responsive or tissue-specific elements, 23 had circadian responsive element, 26 with stress and defense related elements (TC-rich repeats), with 76 genes possessed ARE (anaerobic induction responsive element), 43 carried drought inducibility element (MBS) and 32 harbored LTR (low temp. responsive elements). All identified TALE genes were enriched with light responsive elements. Furthermore, many TALE genes also possessed phytohormonal cis-regulatory elements such as ABRE, CGTCA/TGACG, GARE/P-box, TGA and TCA elements which regulate the abscisic acid (ABA), jasmonic acid (MeJA), gibberellic acid (GA), auxins and salicylic acid signaling, respectively. Promotor analysis for 18 highly expressed genes depicted the presence of prominent elements responsive against abiotic stresses in those genes (Fig. 8).

Discussion

The TALE family is a widely found transcription factor family in eukaryotes, specifically in plants. The function of this family is plant growth and development, and is also involved in hormone homeostasis in different plant species. Among the two main subfamilies KNOX and BELL of TALE, Class1 KNOX genes are mainly found to be involved in shoot apical meristem (SAM) development and maintenance by regulating phytohormones, while class2 KNOX genes are expressed in different plant tissues including lateral organs i.e., roots, leaves and meristems, BELL genes are found to be involved in ovule development and lateral organ formation (Fig. 9) (Gonzalez 2015). Recently, genome-wide study of TALE family has been reported in poplar regarding its role in response to salinity (Zhao et al. 2019). However, characterization of TALE family and its functional illustration in stress tolerance is still not much explored in cotton genome. In this study, we comprehensively analyzed the phylogenetic relationships, structural features, motif distribution, chromosomal mapping, duplication events and expression patterning of TALE genes in cotton. Results showed that each of the two diploid cotton genomes (G. arboreum and G. raimondii) possessed 45 TALE genes, while G. hirsutum contained 89 TALE genes, comparative to the results of Ma et al. (2019), in which 94 TALE genes were identified in G. hirsutum. This difference in number of identified genes might be due to use of different genome assemblies of three cotton species, i.e., JGI (https://cottonfgd.org) and NAU (https://www.cottongen.org/assemblies). Further, in our study we use five reference genomes including A. thaliana, T. cacao, C. papaya, G. max, V. venifera, (http://planttfdb.cbi.pku.edu.cn/) for retrieval of TALE homologs in three cotton species while comparing with the study of Ma et al. (2019). Further, these findings are in consistent with the historical evolution of tetraploid cotton, as it evolved under the influence of whole genome duplication events (Zhang et al. 2015). Phylogenetic analysis also represents the close relationship of upland cotton genes with G. arboreum and G. raimondii in terms of block locations. While the abundance of G. hirsutum TALE genes is due to the occurrence of whole genome duplication. Out of total 89 cotton TALE genes, 42 are KNOX genes while 47 are BELL-like genes, as previously described in Arabidopsis and poplar (Hamant and Pautot 2010; Zhao et al. 2019). Phylogenetic tree was divided into 17 subgroups according to the name of genes in each subgroup. All genes are localized in the nucleus, which is in consistent with localization prediction of TALE genes in poplar (Zhao et al. 2019), except two genes located in plasma membrane that might suggests the diverse distribution of cotton TALE genes in cell compartments. Further, 88 of these genes are non-uniformly distributed on 26 chromosome of tetraploid cotton (13 chromosomes of each A-sub-genome and D-sub-genome) and one gene is mapped on a scaffold. These results are in agreement with the distribution of GRAS family genes in cotton genome (Zhang et al. 2018). We categorized TALE proteins in three classes according to their domains as mentioned earlier in Arabidopsis (Hamant and Pautot 2010). KNOX genes are divided into two groups (KNOX1 and KNOX2) whereas BELL genes lie in same group. The domain analysis using SMART database is congruent with our domain classification. Verification from the physiochemical properties of cotton TALE proteins, it is concluded that all proteins are hydrophilic in nature, but differ greatly in other aspects. Protein length and molecular weight is higher in BELL proteins as compared to KNOX proteins, whereas isoelectric point varies significantly in KNOX1 members, which is in contrast with the findings in poplar (Zhao et al. 2019).

Gene duplication is a characteristic feature of genome architecture, which insights an important role in plant genomic evolution, resulting the formation of new raw genetic materials that can be influenced by selection, mutation and genetic drift, ultimately evolves with new gene functions and gene interacting networks (Flagel and Wendel 2009). Adaptability and evolution of plants is ensured by gene duplication which leads to the genome expansion with diversified gene functions (Xu et al. 2012). The creation of new gene functions resulted due to gene duplication event during the evolution of cotton (Rong et al. 2010). In upland cotton, multigene families were originated specifically due to region-specific gene duplication (Li et al. 2015). In our study, 34 pairs of TALE genes were found segmentally duplicated as analyzed by their Ka/Ks ratio, revealed that the expansion of cotton TALE genes was solely under the influence of segmental duplication. These results are in consistent with the other gene families in upland cotton such as GRAS family in cotton, and LEA family in cotton, Arabidopsis and Brassica (as 72 out of 108 genes expanded through segmental duplication) (Hundertmark and Hincha 2008; Liang et al. 2016; Magwanga et al. 2018; Zhang et al. 2018). Further, syntenic analysis exposed that 13 TALE gene pairs had high similarity, which confirms that GohirTALE genes are embodied in highly conserved regions of tetraploid cotton genome which could be lost or recovered within these conserved regions as observed in other gene families (Yu et al. 2014)

Gene structural analysis showed that cotton TALE genes within same subgroup shares similar intron/exon structure inline to previous studies in Arabidopsis, grapes and rice for MYB gene family (Jiang et al. 2004; Matus et al. 2008). Evolution of multigene families is possible under the mechanism of structural diversity of gene and divergence of conserved motif (Wang et al. 2013). In our study, we found 2 to 5 introns in all TALE genes in cotton, having very less number of introns which is similar to the results of poplar TALE family (Zhao et al. 2019). It is also found that genes responsive to biotic and abiotic stresses have lesser number of introns, which is also previously noted in cotton and poplar LEA gene family (Lan et al. 2013; Magwanga et al. 2018). Trehalose-6-phosphate synthase gene family also carried less introns and involved in abiotic stress responses and regulation of metabolism (Xie et al. 2015). These previous studies regarding gene structure provide better understanding of functional basis of TALE genes in cotton. Motif analysis identified the conserved motifs associated with various TALE domains, showing the association of these proteins with many different functions. This provides a platform for finding new probes in future. GohirTALE proteins showed similar motif arrangement within a subgroup, provides an evidence of involvement of similar motifs in a specific function performed by a group of identical genes (Feller et al. 2011).

According to previous studies, TALE gene family is crucially involved in regulating plant growth and development, so we constructed a heatmap depicting the expression of TALE genes in different plant tissues by using transcriptome data. We also observed expression patterns of cotton TALE genes against various abiotic stresses, concluding that higher number of genes expressing under salt stress. Here, we screened 18 TALE genes having noticeable expression against all types of abiotic stresses. Among these 18 TALE genes, 2 genes belonged to KNAT4, 6 genes from KNAT3 and 5 genes each from BLH1 and BEL1. Then we annotated these screened genes with the Arabidopsis genome to comprehensively understand their functions using TIAR (Lamesch et al. 2012). The homologous gene of BEL1 is AT5G41410 in Arabidopsis, associated with ovule development (Robinson-Beers et al. 1992; Modrusan et al. 1994). AT5G11060 is homologue of cotton KNAT4 genes, involved in root development and influenced by various hormones which effect root growth and development (Truernit and Haseloff 2007). Moreover, the homologue for BLH1 and KNAT3 genes are AT2G35940 and AT5G25220 in model plant Arabidopsis. These genes are involved in embryo sac development but additionally regulate the ABA signaling pathways and hypersensitive to salt stress (Hoth et al. 2002; Kim et al. 2013). During salt and cold stress responsiveness, down regulation of auxin by KNOX genes triggered by auxin responsive factors (ARF) adversely affect the lateral root production. Similarly, the up-regulation of ABA in the presence of ABA responsive elements (ABRE) in the promoter region of BELL genes in response to cold and osmotic stress may cause poor lateral root development (Fig. 9). We also compared the expression profiles of screened genes under abiotic stress against expression in various tissues, and concluded that genes with higher expression under stresses were significantly expressed in roots, leaf and pistil. This conclusion provides greater insight for future cotton breeding against various stresses.

Gene promoters have cis-regulatory elements, importantly involves in transcriptional regulation of various developmental, stress related and phytohormonal responsive genes. It is exemplified that expression of a gene crucially depends upon the presence and absence of these elements (Biłas et al. 2016). Plant hormones provide better adaptability to plants under changing environmental circumstances. Several plant-hormone and stress related elements such as ABRE, MBS, HSE, CGTCA, TCA-elements and GARE have been reported (Kim et al. 2011). In our study, we also found all mentioned and other cis-elements in promoter region of GohirTALE genes. Results showed the presence of more than four abiotic stress responsive elements in TALE genes which are concurrent with the investigation of cis-element analysis of stress related gene families (LEA and LOX family) in upland cotton (Magwanga et al. 2018; Shaban et al. 2018). These evidences strongly suggested that TALE genes might have novel functions against several abiotic stresses.

Conclusion

In this study, 432 TALE members were identified collectively in G. hirsutum, G. arboreum, G. raimondii, A. thaliana, T. cacao, V. venifera, G. max and C. papaya. The TALE transcription factor family was categorized into 17 subgroups based on the phylogenetic tree analysis. Proteins belonged to same subgroups exhibit similar gene structure and protein motif distribution. Chromosomal mapping revealed that GohirTALE members were distributed unevenly on entire tetraploid cotton genome. Gene duplication study confirmed the expansion of TALE genes by segmental duplication and had undergone purifying selection. Furthermore, GO annotation results explained that TALE genes are greatly involved in the transcriptional regulation. Finally, expression profiles of GohirTALE genes under abiotic stresses and in various tissues were revealed using RNA-sequencing data. To support the expression profiles of TALE genes, promoter analysis further revealed the presence of various developmental, phytohormonal and stress responsive cis-elements in it. In conclusion, this study provides comprehensive information about TALE genes and their putative roles in development, growth and abiotic stress tolerance in cotton.

Acknowledgements

The current study is part of M. Phil. research conducted by Ayesha Razzaq and study was supported by the research grant for Department of Plant Breeding and Genetics, Bahauddin Zakariya University, Multan

References

Artimo P, M Jonnalagedda, K Arnold, D Baratin, G Csardi, ED Castro, S Duvaud, V Flegel, A Fortier, E Gasteiger, A Grosdidier, C Hernandez, V Ioannidis, D Kuznetsov, R Liechti, S Moretti, K Mostaguir, N Redaschi, G Rossier, I Xenarios, H Stockinger (2012). ExPASy: SIB bioinformatics resource portal. Nucl Acids Res 40:597‒603

Bailey TL, M Boden, FA Buske, M Frith, CE Grant, L Clementi, J Ren, WW Li, WS Noble (2009). MEME Suite: Tools for motif discovery and searching. Nucl Acids Res 37:202‒208

Bellaoui M, MS Pidkowich, A Samach, K Ksuhalappa, SE Kohalmi, Z Modrusan, WL Crosby, GW Haughn (2007). The Arabidopsis BELL1 and KNOX TALE Homeodomain Proteins Interact through a Domain Conserved between Plants and Animals. Plant Cell 13:24‒55

Biłas R, K Szafran, K Hnatuszko-Konka, AK Kononowicz (2016). Cis-regulatory elements used to control gene expression in plants. Plant Cell Tiss Org Cult 127:269‒287

Briesemeister S, J Rahnenführer, O Kohlbacher (2010). YLoc-an interpretable web server for predicting subcellular localization. Nucl Acids Res 38:497‒502

Bruglin T (1994). A comprehensive classification of homeobox genes. In: Guidebook to the Homeobox Genes, pp: 25‒71

Bürglin TR (1997). Analysis of TALE superclass homeobox genes (MEIS, PBC, KNOX, Iroquois, TGIF) reveals a novel domain conserved between plants and animals. Nucl Acids Res 25:4173‒4180

Chen H, FM Rosin, S Prat, DJ Hannapel (2003). Interacting transcription factors from the three-amino acid loop extension superclass regulate tuber formation. Plant Physiol 132:1391‒1404

Darzentas N (2010). Circoletto: Visualizing sequence similarity with Circos. Bioinformatics 26:2620‒2621

Desplan C, J Theis, PH O’Farrell (2008). The sequence specificity of homeodomain-DNA interaction. Bone 23:1‒7

Feller A, K MacHemer, EL Braun, E Grotewold (2011). Evolutionary and comparative analysis of MYB and bHLH plant transcription factors. Plant J 66:94‒116

Finn RD, A Bateman, J Clements, PCoggill, RY Eberhardt, SR Eddy, A Heger, K Hetherington, L Holm, J Mistry, ELL Sonnhammer, J Tate, M Punta (2014). Pfam: The protein families database. Nucl Acids Res 42:222‒230

Flagel LE, JF Wendel (2009). Gene duplication and evolutionary novelty in plants. New Phytol 183:557‒564

Gao J, H Peng, X He, M Luo, Z Chen, H Lin, H Ding, G Pan, Z Zhang (2014). Molecular phylogenetic characterization and analysis of the WRKY transcription factor family responsive to Rhizoctonia solani in maize. Maydica 59:32‒41

Gong SY, GQ Huang, X Sun, L X Qin, Y Li, L Zhou, XB Li (2014). Cotton KNL1, encoding a class II KNOX transcription factor, is involved in regulation of fibre development. J Exp Bot 65:4133‒4147

Gonzalez DH (2015). Plant Transcription Factors: Evolutionary, Structural and Functional Aspects, pp: 215‒221. Academic Press, London

Hake S, HMS Smith, H Holtan, E Magnani, G Mele, J Ramirez (2004). The role of Knox genes in plant development. Annu Rev Cell Dev Biol 20:125‒151

Hamant O, V Pautot (2010). Plant development: A TALE story. Comptes Rendus – Biol 333:371‒381

Hareven D, T Gutfinger, A Parnis, Y Eshed, E Lifschitz (1996). The making of a compound leaf: genetic manipulation of leaf architecture in tomato. Cell 84:735‒744

Hoth S, M Morgante, JP Sanchez, MK Hanafey, SV Tingey, NH Chua (2002). Genome-wide gene expression profiling in Arabidopsis thaliana reveals new targets of abscisic acid and largerly impaired gene regulation in the abi1-1 mutant. J Cell Sci 115:4891‒4900

Hundertmark M, DK Hincha (2008). LEA (late embryogenesis abundant) proteins and their encoding genes in Arabidopsis thaliana. BMC Genomics 9; Article 118

Hurst LD, EJB Williams, C Pál (2002). Natural selection promotes the conservation of linkage of co-expressed genes. Trends Genet 18:604‒606

Jiang C, X Gu, T Peterson (2004). Identification of conserved gene structures and carboxy-terminal motifs in the Myb gene family of Arabidopsis and Oryza sativa L. spp. indica. Genome Biol 5:1‒11

Jin J, F Tian, DC Yang, YQ Meng, L Kong, J Luo, G Gao (2017). PlantTFDB 4.0: Toward a central hub for transcription factors and regulatory interactions in plants. Nucl Acids Res 45: 1040‒1045

Kerstetter RA, O Laudencia-Chingcuanco, LG Smith, S Hake (1997). Loss-of-function mutations in the maize homeobox gene, knotted1, are defective in shoot meristem maintenance. Development 124:3045‒3054

Kim D, YH Cho, H Ryu, Y Kim, TH Kim, I Hwang (2013). BLH1 and KNAT3 modulate ABA responses during germination and early seedling development in Arabidopsis. Plant J 75:755‒766

Kim JS, J Mizoi, T Yoshida, Y Fujita, J Nakajima, T Ohori, D Todaka, K Nakashima, T Hirayama, K Shinozaki, KY Shinozaki (2011). An ABRE promoter sequence is involved in osmotic stress-responsive expression of the DREB2A gene, which encodes a transcription factor regulating drought-inducible genes in Arabidopsis. Plant Cell Physiol 52:2136‒2146

Kimura S, D Koenig, J Kang, FY Yoong, N Sinha (2008). Natural variation in leaf morphology results from mutation of a novel KNOX gene. Curr Biol 18:672‒677

Lamesch P, TZ Berardini, D Li, D Swarbreck, C Wilks, R Sasidharan, R Muller, K Dreher, DL Alexander, MG Hernandez, AS Karthikeyan, CH Lee, WD Nelson, L Ploetz, S Singh, A Wensel, E Huala (2012). The Arabidopsis Information Resource (TAIR): Improved gene annotation and new tools. Nucl Acids Res 40:1202‒1210

Lan T, J Gao, QY Zeng (2013). Genome-wide analysis of the LEA (late embryogenesis abundant) protein gene family in Populus trichocarpa. Tree Genet Genomics 9:253‒264

Letunic I, T Doerks, P Bork (2015). SMART: Recent updates, new developments and status in 2015. Nucl Acids Res 43:257‒260

Li F, G Fan, C Lu, G Xiao, C Zou, RJ Kohel, Z Ma, H Shang, X Ma, J Wu, X Liang, G Huang, RG Percy, K Liu, W Yang, W Chen, X Du, C Shi, Y Yuan, W Ye, X Liu, X Zhang, W Liu, H Wei1, S Wei1, G Huang, X Zhang, S Zhu, H Zhang, F Sun, X Wang, J Liang, J Wang, Q He, L Huang, J Wang, J Cui1, G Song, K Wang, X Xu, JZ Yu, Y Zhu, S Yu (2015). Genome sequence of cultivated Upland cotton (Gossypium hirsutum TM-1) provides insights into genome evolution. Nat Biotechnol 33:524‒530

Li KB (2003). ClustalW-MPI: ClustalW analysis using distributed and parallel computing. Bioinformatics 19:1585‒1586

Liang Y, Z Xiong, J Zheng, D Xu, Z Zhu, J Xiang, J Gan, N Raboanatahiry, Y Yin, M Li (2016). Genome-wide identification, structural analysis and new insights into late embryogenesis abundant (LEA) gene family formation pattern in Brassica napus. Sci Rep 6; Article 24265

Ma Q, N Wang, P Hao, H Sun, C Wang, L Ma, H Wang, X Zhang, H Wei, S Yu (2019). Genome-wide identification and characterization of TALE superfamily genes in cotton reveals their functions in regulating secondary cell wall biosynthesis. BMC Plant Biol 19; Article 432

Magnani E, S Hake (2008). KNOX lost the OX: The Arabidopsis KNATM gene defines a novel class of KNOX transcriptional regulators missing the homeodomain. Plant Cell 20:875‒887

Magwanga RO, P Lu, JN Kirungu, H Lu, X Wang, X Cai, Z Zhou, Z Zhang, H Salih, K Wang, F Liu (2018). Characterization of the late embryogenesis abundant (LEA) proteins family and their role in drought stress tolerance in upland cotton. BMC Genet 19; Article 6

Matus JT, F Aquea, P Arce-Johnson (2008). Analysis of the grape MYB R2R3 subfamily reveals expanded wine quality-related clades and conserved gene structure organization across Vitis and Arabidopsis genomes. BMC Plant Biol 8; Article 83

Modrusan Z, L Reiser, KA Feldmann, RL Fischer, GW Haughn (1994). Homeotic transformation of ovules into carpel-like structures in Arabidopsis. Plant Cell 6:333‒349

Nam J, M Nei (2005). Evolutionary change of the numbers of homeobox genes in bilateral animals. Mol Biol Evol 22:2386‒2394

Nuruzzaman M, R Manimekalai, AM Sharoni, K Satoh, H Kondoh, HKS Ooka (2010). Genome-wide analysis of NAC transcription factor family in rice. Gene 465:30‒44

Osorio MB, L Bücker-Neto, G Castilhos, AC Turchetto-Zolet, B Wiebke-Strohm, MH Bodanese-Zanettini, M Margis-Pinheiro (2012). Identification and in silico characterization of soybean trihelix-GT and bHLH transcription factors involved in stress responses. Genet Mol Biol 35:233‒246

Reiser L, P Sanchez-Baracaldo, S Hake (2000). Knots in the family tree: evolutionary relationships and functions of KNOX homeobox genes. Plant Mol Biol 42:151‒166

Robinson-Beers K, RE Pruitt, CS Gasser (1992). Ovule development in wild-type Arabidopsis and two female-sterile mutants. Plant Cell 4:1237‒1249

Rong J, FA Feltus, L Liu, L Lin, AH Paterson (2010). Gene copy number evolution during tetraploid cotton radiation. Heredity 105:463‒472

Shaban M, MM Ahmed, H Sun, A Ullah, L Zhu (2018). Genome-wide identification of lipoxygenase gene family in cotton and functional characterization in response to abiotic stresses. BMC Genomics 19; Article 599

Sievers F, DG Higgins (2018). Clustal Omega for making accurate alignments of many protein sequences. Protein Sci 27:135‒145

Suyama M, D Torrents, P Bork (2006). PAL2NAL: Robust conversion of protein sequence alignments into the corresponding codon alignments. Nucl Acids Res 34:609‒612

Truernit E, J Haseloff (2007). A role for KNAT class II genes in root development. Plant Signal Behav 2:10‒12

Wang K, Z Wang, F Li, W Ye, J Wang, G Song, Z Yue, L Cong, H Shang, S Zhu, C Zou, Q Li, Y Yuan, C Lu, H Wei1, C Gou, Z Zheng, Y Yin, X Zhang, K Liu, B Wang, C Song, N Shi, RJ Kohel, RG Percy, JZ Yu, YX Zhu, J Wang, S Yu (2012). The draft genome of a diploid cotton Gossypium raimondii. Nat Genet 44:1098‒1103

Wang N, Y Zheng, H Xin, L Fang, S Li (2013). Comprehensive analysis of NAC domain transcription factor gene family in Vitis vinifera. Plant Cell Rep 32:61‒75

Xie DW, XN Wang, LS Fu, J Sun, W Zheng, ZF Li (2015). Identification of the trehalose-6-phosphate synthase gene family in winter wheat and expression analysis under conditions of freezing stress. J Genet 94:55‒65

Xu G, C Guo, H Shan, H Kong (2012). Divergence of duplicate genes in exon-intron structure. Proc Natl Acad Sci USA 109:1187‒1192

Yang S, X Zhang, JX Yue, D Tian, JQ Chen (2008). Recent duplications dominate NBS-encoding gene expansion in two woody species. Mol Genet Genomics 280:187‒198

Yao Q, M Xin, Y Guanghui, Q Wang, L Wang, LKong, W Kim, HW Wang (2014). Evolutionary history of trihelix family and their functional diversification. DNA Res 21:499‒510

Yoon J, LH Cho, SL Kim, H Choi, HJ Koh, G An, (2014). The BEL1-type homeobox gene SH5 induces seed shattering by enhancing abscission-zone development and inhibiting lignin biosynthesis. Plant J 79:717‒728

Yu J, S Tehrim, F Zhang, C Tong, J Huang, X Cheng, C Dong, Y Zhou, R Qin, W Hua, S Liu (2014). Genome-wide comparative analysis of NBS-encoding genes between Brassica species and Arabidopsis thaliana. BMC Genomics 15; Article 3

Yuan, Q., C. Zhang, T. Zhao, M. Yao and X. Xu, 2018. A genome-wide analysis of GATA transcription factor family in tomato and analysis of expression patterns. Intl J Agric Biol 20:1274−1282

Zhang B, J Liu, ZE Yang, EY Chen, CJ Zhang, XY Zhang, FG Li (2018). Genome-wide analysis of GRAS transcription factor gene family in Gossypium hirsutum L. BMC Genomics 19; Article 348

Zhang T, Y Hu, W Jiang, L Fang, X Guan, J Chen, J Zhang, CA Saski, BE Scheffler, DM Stelly, AM Hulse-Kemp, Q Wan, B Liu, C Liu, S Wang, M Pan, Y Wang, D Wang, W Ye, L Chang, W Zhang, Q Song, RC Kirkbride, X Chen, E Dennis, DJ Llewellyn, DG Peterson, P Thaxton, DC Jones, Q Wang, X Xu, H Zhang, H Wu, L Zhou, G Mei, S Chen, Y Tian, D Xiang, X Li, J Ding, Q Zuo, L Tao, Y Liu, J Li, Y Lin, Y Hui, Z Cao, C Cai, X Zhu, Z Jiang, B Zhou, W Guo, R Li, ZF Chen (2015). Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a resource for fiber improvement. Nat Biotechnol 33:531‒537

Zhao K, X Zhang, Z Cheng, W Yao, R Li, TJiang, B Zhou (2019). Comprehensive analysis of the three-amino-acid-loop-extension gene family and its tissue-differential expression in response to salt stress in poplar. Plant Physiol Biochem 136:1‒12